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1 METHODS AND APPARATUS FOR DATA RETRIEVAL 

2 

3 BACKGROUND 

4 Various types of data handling systems are known in the related arts. One type 



5 of data handling system generally includes one or more remotely located data 

6 generation entities (i.e., field offices having file servers, user computers, etc.) that are 

7 coupled in data communication with a local data retrieval and storage entity (i.e., main 

8 office having large storage facilities, mainframe computers, etc.). 

9 Under such a system, the local entity typically retrieves (i.e., gathers) data files 

10 from the remote entity or entities at fixed intervals of time for storage, processing, or 

11 other tasks. Upon (or shortly after) retrieval of the data files, each remote entity 

12 generally deletes their copies of the retrieved files so that storage space within the 

13 remote entity can be reclaimed for future data files or other usage. 

14 However, the rate at which new data files are generated (i.e., drafted or created) 

15 typically varies within each remote entity due, for example, to present workload, holiday 

16 or special periods, phase of business cycle, etc. As a result, the rate at which data file 

17 storage space is consumed within the remote entity varies correspondingly. This can -t. 

18 lead to insufficient data file storage space within the remote entity if the fixed interval 

19 between data file retrievals by the local entity is too great for the present rate of data file 

20 generation. 



21 Thus, it is desirable to provide methods and apparatus for use with data handling 

22 systems that address the problems described above. 
23 

24 SUMMARY 

25 One embodiment provides for a method of retrieving data, including the steps of 

26 waiting for a predefined interval of time, retrieving a first quantity of data from a remote 

27 entity after the predefined interval of time, and redefining the interval of time in 

28 accordance with a predefined function. 

29 Another embodiment provides for a computer-accessible storage media including 

30 an executable program code. The program code is configured to cause a processor to 

31 wait for a predefined interval of time, and thereafter retrieve a first quantity of data. The 

32 program code is further configured to cause the processor to redefine the interval of time 

33 in accordance with a predefined function. 

34 Yet another embodiment provides for a data system, including a remote entity 

35 configured to store data, and a user computer coupled in data communication with the 
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1 remote entity. The user computer is configured to generate and store data within the 

2 remote entity. The data system further includes a local entity coupled in data 

3 communication with the remote entity. The local entity is configured to wait for a 

4 predefined interval of time, and to retrieve a first quantity of data from the remote entity 

5 after the predefined interval of time. The first quantity of data defines a retrieval quantity. 

6 The local entity is also configured to divide the predefined interval of time by the retrieval 

7 quantity to define a data creation period, and then to multiply the data creation period by 

8 a predefined quantity to redefine the interval of time. The local entity is further 

9 configured to wait for the redefined interval of time, and thereafter to retrieve a second 
10 quantity of data from the remote entity. 



1 1 These and other aspects and embodiments will now be described in detail with 

12 reference to the accompanying drawings, wherein: 
13 

14 DESCRIPTION OF THE DRAWINGS 

15 Fig. 1 is an exemplary time sequence diagram depicting a data retrieval method 

16 according to the prior art. 

17 Fig. 2 is a block diagram depicting a data handling system in accordance with an 

18 embodiment of the present invention. 

19 Fig. 3 is an exemplary time sequence diagram depicting a method in accordance 

20 with another embodiment of the present invention. 

21 Fig. 4 is a flowchart depicting a method in accordance with still another 

22 embodiment of the present invention. 

23 

24 DETAILED DESCRIPTION 

25 In representative embodiments, the present teachings provide methods and 



26 apparatus for retrieving data using a local entity in correspondence to a rate at which the 

27 data is being generated within a remote entity. As used herein, a remote entity can be 

28 generally defined by any device or system that is usable to generate quantities of data 

29 and/or store that data in electronic form (i.e., a data file or files) in preparation for 

30 retrieval by a local entity. In turn, a local entity generally refers to any device or system 



31 in accordance with the present invention that is usable to electronically retrieve data 

32 generated by one or more remote entities. 

33 Thus, remote and local entities can be respectively defined by a relatively wide 

34 variety of devices such as servers, user computers, computer-accessible file storage 

35 arrays, etc. As further used herein, a main office generally refers to the location of a 
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1 local entity, while remote entities are generally respectively associated with (i.e., located 

2 within) field offices or other locations that are considered remote with respect to the main 

3 office. The term 'office' is used herein exemplify the sort of usage environment typical to 

4 the present invention, but is in no way intended limit the use or application of the present 

5 invention to office environments in the conventional sense. 

6 Therefore, broadly speaking, the present invention is generally directed to the 

7 systematic retrieval of electronic data files from one or more remote entities respectively 

8 located within a remote office or offices, by a local entity within a main office, by way of 

9 any suitable electronic communications infrastructure. Typical such infrastructures 

10 include, for example, the Internet, a local area network (LAN), a wide are network 

11 (WAN), etc. 

12 Fig. 1 is an exemplary time sequence diagram depicting a data retrieval method 

13 20 according to the prior art. The diagram of method 20 includes a first office timeline 22 

14 and a second office timeline 24, each depicting a time-sequence retrieval of data. The 

15 diagram of method 20 further includes time instances TO, T1, T2, T3 and T4, 

16 respectively, during each of which a corresponding plurality of data files 26 is retrieved 

17 as depicted on the first office timeline 22 and the second office timeline 24, respectively. 

18 The method 20, as exemplified in Fig. 1 , is performed as follows: at a time TO, a quantity 

19 of five data files 26 are retrieved from the first office (not shown) as depicted on the 

20 timeline 22. At this same time TO, no data files 26 are retrieved from the second office 

21 (not shown) as depicted on the timeline 24. Thus, at time TO, the first office has five files 

22 26 ready to be retrieved, while the second office has none. 



23 Then, the method 20 waits for a fixed interval of time "Tl". As depicted in Fig: 1, 

24 interval of time T1 is six units in length, with units defined by any suitable unit of time 

25 such as minutes, hours, days, etc. For purposes herein, it is assumed that each interval 

26 of time Tl is equal to six hours. 

27 Then, after waiting for interval of time Tl, the method 20 retrieves three data files 

28 26 from the first office (timeline 22) and two data files 26 from the second office (timeline 

29 24) at time T1 . Thereafter, the method 20 waits again for the fixed interval of time Tl, or 

30 six hours. 

31 The method 20 then continues, in a generally iterative 'retrieve-and-wait' fashion 

32 substantially as described above, gathering data files 26 in quantities of four, four, and 

33 six from the first office (timeline 22), and in quantities of one, two, and two from the 

34 second office (timeline 24) at times T2, T3 and T4, respectively. 



Case 200300426-1 



1 As depicted in Fig. 1, the method 20 is not responsive to the number of data files 

2 26 (i.e., quantity of data) retrieved from any particular office and/or at any particular time 

3 in regard to determining the fixed interval of time Tl. As such, the method 20 tends to 

4 result, from time to time, in a generally undesirable excess of data files 26 that are 

5 awaiting retrieval from a given office as the rate of generation of data files 26 within that 

6 given office varies due to any number of factors. This can lead to a general shortage of 

7 data storage space within the data handling resources of a given office (not shown in 

8 Fig. 1) while awaiting data file 26 retrieval by a main office (not shown in Fig. 1), further 

9 leading to an undesirable slowdown or stoppage of data file generation (i.e., work 

10 processing) within an office so affected. 

1 1 Methods and apparatus in accordance with embodiments of the present invention 

12 are described hereafter. 

13 Fig. 2 is a block diagram depicting a data handling system 100 in accordance 

14 with one embodiment of the present invention. The data handling system 100 includes a 

15 first field office (hereafter, first office) 102 and a second field office (hereafter, second 

16 office) 104. The first office 102 and the second office 104 each include a client server 

17 106, a spooler 108, and a user computer 110. Each of the client servers 106, the 

18 spoolers 108 and the user computers 110 can respectively include any such suitable 

19 device that is normally usable in the setup, generation, handling and/or storage of data 

20 and job accounting data files (hereafter, data files) 126. One of skill in the data 

21 processing and computing arts is familiar with typical such devices 106, 108 and 110, 

22 respectively, and further elaboration is not required for purposes of understanding the 

23 present invention. Furthermore, other suitable data handling and processing devices 

24 (not shown) can also be used in conjunction with the system 100. 

25 Each of the first and second offices 102 and 104 of the data handling system 100 

26 further includes a job accounting appliance (hereafter, JA appliance) 112. Each JA 

27 appliance 112 is configured to receive and store data files 126 from each of the 

28 corresponding client server 106, the spooler 108, and the user computer 110 within the 

29 particular office 102 or 104. Furthermore, each JA appliance 112 is coupled in data 

30 communication with the Internet 128. For purposes herein, each JA appliance 112 

31 within the first and second offices 102 and 104 is considered to be a remote entity. 

32 The data handling system 100 also includes a main office 130. The main office 

33 130 includes a backoffice system 132 coupled in data communication with the first office 

34 102 and the second office 104 by way of the Internet 128. The backoffice system 132 

35 can include any suitable data handler configured to retrieve and store data files 126 from 
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1 the first office 102 and the second office 104 in accordance with the present invention. 

2 As depicted in Fig. 2, the backoffice system 132 includes a processor 134 and a memory 

3 (i.e., computer-accessible storage media) 136. The memory 136 further includes an 

4 executable program code 138 that is configured to cause the processor 134 to perform 

5 various normal operations of the backoffice system 132. Typical such normal operations 

6 as performed by the processor 134, under the control of the program code 138 and in 

7 accordance with the present invention, are described in detail hereafter. The backoffice 

8 system 132 is considered to be a local entity for purposes herein. Although field offices 

9 102 and 104 are depicted as being in communication with the main office 130 via the 

10 Internet, these entities can also be in communication with one another via a LAN, a 

1 1 WAN, a private internet, or other known network communication systems. 

12 It is to be understood that other suitable embodiments (not shown) of the 

13 backoffice system 132 can include any number of other elements and devices such as, 

14 for example, data storage devices, additional processors, input/output circuitry, operator 

15 interfaces, power supplies, etc., as required and/or desired for the respective range of 

16 normal operations associated with a particular embodiment of the backoffice system 

17 132. Further elaboration of the backoffice system 132 is not required for purposes of 

18 understanding the present invention. 

19 Furthermore, other embodiments (not shown) of the data handling system 100 

20 can also be used in accordance with the present invention. Such other embodiments 

21 (not shown) can include, for example: differing configurations of the first office 102 

22 and/or second office 104; additional similar offices coupled in data communication with 

23 the main office 130; additional data file 126 generation, handling or storage devices; 

24 printers and other imaging apparatus; etc. Varying embodiments (not shown) of the data 

25 handling system 100 can be used as required and/or desired to provide correspondingly 

26 ranges of normal operations, while doing so in accordance with the teachings of the 

27 present invention. In any case, typical operation of the data handling system 100 is 

28 described in detail hereafter. 

29 Fig. 3 is an exemplary time sequence diagram depicting a data retrieval method 

30 200 in accordance with another embodiment of the present invention. As depicted in 

31 Fig. 3, the diagram of method 200 includes a first office timeline 222 and a second office 

32 timeline 224. Each of the timelines 222 and 224 depicts the retrieval of data files 126 at 

33 corresponding event times TO through T8, inclusive, from respectively associated first 

34 and second offices "OFFICE 1" and "OFFICE 2" (e.g., first and second offices 102 and 

35 104 of Fig. 2). 
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1 Reference is now made to both Figs. 2 and 3. Exemplary typical operation under 

2 the method 200 is as follows: to begin, it is assumed that the most recent retrieval of 

3 data files 126 (Fig. 3) took place six units of time (i.e., minutes, hours, etc.) prior to time 

4 TO for each of the first and second offices 102 and 104 (Fig. 2). Thus, the prior interval 

5 of time for each of the first and second timelines 222 (Fig. 3) and 224 is six units, 

6 respectively (i.e., Tl=6). 

7 Then, at time TO (Fig. 3), the processor 134 (Fig. 2), under the control of the 

8 program code 138, causes the backoffice system 132 to retrieve five data files 126 

9 (Fig. 3) from the JA appliance 112 (Fig. 2) of the first office 102 and no data files 126 

10 (Fig. 3) from the JA appliance 112 of the (Fig. 2) second office 104, by way of the 

11 Internet 128. After the retrieval is complete, the JA appliance 112 within the first office 

12 102 deletes the five corresponding data files 126 (Fig. 3) stored therein. 

13 In one embodiment of the system 100, the JA appliance within the first office 102 

14 performs the data file 126 deletion automatically after the retrieval. In another 

15 embodiment of the system 100, the backoffice system 132 issues a command signal 

16 causing the JA appliance 1 12 to perform the deletion of the data files 126. For example, 

17 the backoffice system 132 can issue a file-delete command to the JA applicant 112 after 

18 the backoffice system has verified the quality of the files received. Other embodiments 

1 9 of the system 1 00 can also be used. 



20 Next, the processor 134 (Fig. 2) calculates new (i.e., redefines) retrieval intervals 

21 of time for each of the first and second offices 102 and 104 in accordance with the 

22 following functional steps: 

23 1 ) Divide the prior interval of time by the quantity of data retrieved from each 

24 of the JA appliance 1 12 at time TO to define a data creation period for each office 

25 102 and 104. 

26 For the first office 102, the data creation period is: (6 time units) / (5 files) 

27 = 6/5 or 1 .2 time units per file. 

28 For the second office 104, the data creation period is: (6 time units) / 

29 (0 files) = undefined, so default to 6 time units per file (no change). 

30 2) Multiply the data creation period for each of the first and second offices 

31 102 and 104 by a predefined, substantially optimum retrieval quantity for each of 

32 the JA appliances 112, to redefine the interval of time for retrieving data from 

33 each respective office 102 and 104. For purposes of example, it is assumed that 

34 a quantity of three files is optimum for each JA appliance 112. 



Case 200300426-1 



1 For the first office 102, the redefined interval of time is: (6/5 time units per 

2 file)(3 files optimum) = 3.6 units of time. For purposes of example, it is assumed 

3 that the processor 134 rounds up to four units of time (i.e., Tl=4). 

4 For the second office 104, the data creation period is presently undefined, 

5 so the program code 138 causes the processor 134 to select the prior six units of 

6 time as the 'redefined' interval of time (i.e., Tl=6). 

7 The processor 134 (Fig. 2) of the backoffice system 132 then waits for each of 

8 the redefined intervals of time to expire. As depicted, the processor 134 (Fig. 2) causes 

9 the backoffice system 132 to retrieve three data files 126 (Fig. 3) at time T1 from the JA 

10 appliance 112 (Fig. 2) of the first office 102, and two data files 126 (Fig. 3) at time T2 

1 1 from the JA appliance 112 (Fig. 2) of the second office 104. The program code 138 then 

12 causes the processor 134 to recalculate (redefine) intervals of time for each of the JA 

13 appliances 112 (i.e., first and second offices 102 and 104), in accordance with the 

14 method described in steps 1) and 2) above. Thus, the processor 134 redefines the 

1 5 intervals of time as follows: 

16 3) For the first office 1 02: (4 units) / (3 files) = 4/3 time units per file; 

17 (4/3 time units per file )(3 files optimum) = 4 units of time (i.e., Tl=4). 

18 4) For the second office 104: (6 units) / (2 files) = 3 time units per file; 

19 (3 time units per file)(3 files optimum) = 9 units of time (i.e., Tl=9). 

20 The processor 1 34 (Fig. 2) then waits for each of the redefined intervals of time to 

21 expire at times T3 (Fig. 3) and T5, respectively. Furthermore, each of the JA appliances 

22 112 (Fig. 2) within the first and second offices 102 and 104 delete their respective copies 

23 of the data files 126 (Fig. 3) retrieved at times T1 and T2. The method 200 then 

24 continues in a generally iterative 'retrieve, calculate and wait' process substantially as 

25 described above and as depicted in Fig. 3. It is noted that data files 126 (Fig. 3) are also 

26 retrieved at a time T4 from the JA appliance 112 (Fig. 2) of the first office 102, which 

27 occurs prior to the time T5 (Fig. 3). This is due to the relatively shorter intervals of time 

28 that are being waited between retrievals from the first office 102 (Fig. 2) versus those of 

29 the second office 104 (i.e., Tl=4 versus Tl=9 of Fig. 3). 

30 Thus, the method 200 provides for dynamically redefining the interval of time that 

31 is waited before a subsequent retrieval of data files 126 (i.e., quantity of data) from the 

32 first office 102 or the second office 104, in accordance with a function of the prior interval 

33 of time, the quantity of data just retrieved, and the substantially optimum (i.e., 

34 predefined) retrieval quantity of data, respectively. 
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1 In this way, the method 200 of the present invention generally provides for the 

2 repetitive gathering of data (i.e., data files 126) from remote entities (i.e., JA appliances 

3 112 within offices 102 and 104) in correspondence to the rate at which data is generated 

4 within the particular remote entity. In doing so, the method 200 substantially prevents 

5 excessive delays in retrieving the data waiting within the respective remote entities. 

6 Thus, excessive data accumulation within the corresponding remote entities is also 

7 substantially prevented. 

8 Furthermore, the method 200 also substantially prevents over-aggressive data 

9 retrieval by permitting the redefinition of the optimum retrieval quantity. For example, if 

10 the storage resources of a particular JA appliance 126 permit, then the corresponding 

1 1 optimum retrieval quantity can be suitably increased by way, for example, of a user input 

12 to the backoffice system 132 (as facilitated by a corresponding embodiment of the 

13 program code 138, etc.), resulting in a corresponding increase in the interval of time 

14 between consecutive data retrievals (decreased data retrieval frequency) by the 

15 backoffice system 132 (i.e., local entity). 

16 Such a reduction in data retrieval frequency can be desirable, for example, in 

17 circumstances where those responsible for a local entity are paying a per-usage or 

18 per-access fee to an Internet service provider (or other network administration agency),- 

19 or in any other situation where reduced network (i.e., Internet) access frequency is 

20 generally favorable. In any case, a re-definable optimum retrieval quantity provides for 

21 user-selectable system performance adjustment and tuning. 

22 Fig. 4 is a flowchart depicting a data retrieval method 300 in accordance with still 

23 another embodiment of the present invention. The method 300 is substantially similar to 

24 the method 200 of Fig. 3 as described above. While the method 300 describes 

25 particular steps and order of execution, it is to be understood that other methods 

26 respectively including other steps and order of execution can also be used in 

27 accordance with the present invention. For clarity of understanding, the method 300 will 

28 be described within the context of the data handling system 100 of Fig. 2. 

29 In step 302 (Fig. 4), the processor 134 (Fig. 2), which is executing program code 

30 138, determines the most recently waited interval of time (Tl) associated with the JA 

31 appliance 112 (i.e., remote entity) of the first office 102. 

32 In step 304 (Fig. 4), the processor 134 (Fig. 2) causes a quantity (N) of data files 

33 126 to be retrieved (i.e., transferred) from the JA appliance 112 of the first office 102 to 

34 the backoffice system 132 (i.e., local entity) of the main office 130 by way of the Internet 

35 128. The processor 134 thereafter signals the JA appliance 1 12 of the first office 102 to 
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1 delete the remote instance of the data files 126 so as to recover the corresponding 

2 storage space within the JA appliance 112. 

3 In step 306 (Fig. 4), the processor 134 (Fig. 2) divides the interval of time (Tl) by 

4 the quantity (N) of data files 126 to calculate a data creation period (CP) for the remote 

5 entity (i.e., JA appliance 112 of the first office 102). Thus, (CP) = (Tl) / (N). In the event 

6 that no data files 126 were retrieved (i.e., (N) = zero), the processor 134 assigns a 

7 default value of zero to the data creation period (CP). 



8 In step 308 (Fig. 4), the processor 134 (Fig. 2) multiplies the data creation period 

9 (CP) by a predefined optimum retrieval quantity (Q) to redefine the interval of time (Tl) 

10 that will be used for the next iteration of data file 126 retrieval. Thus, (Tl) = (CP)(Q). In 

11 the event that (CP) = zero as a result of step 306 above, the processor 134 maintains 

12 the existing value of (Tl) as determined in step 302 above by default. 

13 In step 310 (Fig. 4), the processor 134 (Fig. 2) waits for the interval of time (Tl) as 

14 redefined in step 308 above. 

15 In step 312 (Fig. 4), the processor 134 (Fig. 2) determines if additional data 



16 retrieval is required. Such a determination can be based, for example, on an operator 

17 input to the backoffice system 132, on time of day scheduling, on a data retrieval error 

18 detection or other strategy, etc. If the processor 134 determines that additional data 

19 retrieval is required, then the method 300 (Fig. 4) returns to step 304 and begins another 

20 iteration of the steps 304 through 312, inclusive. If the processor 134 (Fig. 2) 

21 determines that no additional data retrieval is required, then the method 300 (Fig. 4) is 

22 terminated. 

23 In this way, the method 300 of the present invention provides for the retrieval of 

24 data files (i.e., quantities of data) by a local entity from a remote entity at dynamically 

25 redefined intervals of time as a function of the most recently waited interval of time, the 

26 quantity of data just retrieved, and a predefined (and selectively re-definable) optimum 

27 retrieval quantity. Thus, the method 300 generally optimizes the rate at which data is 

28 retrieved from a remote entity in correspondence to the rate at which that data is being 

29 generated by the remote entity. In turn, the method 300 substantially eliminates both 

30 excessive data accumulation within the remote entity and unnecessarily frequent 

31 network access (i.e., data retrieval by way of the Internet). 

32 Furthermore, respective embodiments of the method 200 of Fig. 3 and the 

33 method 300 of Fig. 4 can use predetermined (i.e., optimum) retrieval quantities that are 

34 based upon a predetermined optimization formula. One example of such a formula is as 

35 follows: 
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1 5) Optimum file retrieval count = ((optimum retrieval packet size - overhead) 

2 / file size); 

3 wherein, for example: optimum retrieval packet size = 4,000 bytes; 

4 overhead = 1 00 bytes; and file size = 500 bytes. 

5 Under such an exemplary arrangement, the optimum file retrieval count (i.e., 

6 number of data files 126 of Fig. 3) would be: ((4,000 - 100) / 500) = 7.8 files; rounded up 

7 to 8 (eight) files per retrieval. Thus, the optimum data retrieval quantity would be: (4,000 

8 - 100) = 3,900 bytes of data per retrieval. Note that overhead is generally referred to as 



9 information required for executing the data retrieval such as, for example, routing 

10 information, identity and/or verification stamping, encryption information, time and date 

11 stamps, etc. Thus, overhead is not generally considered to be a part of the data being 

12 retrieved (i.e., the data files 126 of Fig. 3) within a given data packet. Other 

13 predetermined optimization formulas can also be used in correspondence with varying 

14 embodiments of the methods 200 and 300 described above. 

15 While the above methods and apparatus have been described in language more 

16 or less specific as to structural and methodical features, it is to be understood, however, 

17 that they are not limited to the specific features shown and described, since the means 

18 herein disclosed comprise preferred forms of putting the invention into effect. The 

19 methods and apparatus are, therefore, claimed in any of their forms or modifications 

20 within the proper scope of the appended claims appropriately interpreted in accordance 

21 with the doctrine of equivalents. 
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